Pattern Directed Mining of Sequence Data
نویسندگان
چکیده
Sequence data arise naturally in many applications, and can be viewed as an ordering of events, where each event has an associated time of occurrence. An important characteristic of event sequences is the occurrence of episodes, i.e. a collection of events occurring in a certain pattern. Of special interest axe ~r~uent episodes, i.e. episodes occurring with a frequency above a certain threshold. In this paper, we study the problem of mining for f~equent episodes in sequence data. We present a framework for efficient mining of frequent episodes which goes beyond previous work in a number of ways. First, we present a language for specifying episodes of interest. Second, we describe a novel data structure, called the sequential pattern tree (SP Tree), which captures the relationships specified in the pattern language in a very compact manner. Third, we show how this data structure can be used by a standard bottomup mining algorithm to generate frequent episodes in an efficient manner. Finally, we show how the SP Tree can be optimized by sharing common conditions, and evaluating each such expression only once. We present the results of an evaluation of the proposed techniques.
منابع مشابه
High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کاملA Proposition for Sequence Mining Using Pattern Structures
In this article we present a novel approach to rare sequence mining using pattern structures. Particularly, we are interested in mining closed sequences, a type of maximal sub-element which allows providing a succinct description of the patterns in a sequence database. We present and describe a sequence pattern structure model in which rare closed subsequences can be easily encoded. We also pro...
متن کاملSurvey on Sequence Discovery Using Dna Sequence Mining Data
Sequence Mining is one of the most commonly used technique in data mining. Sequence mining is the process of mining frequent patterns from a large datasets. The exiting algorithms have some limitations in predicting frequent patterns, in terms of time, space complexity and accuracy. To overcome these drawbacks, in this paper made a study on existing sequence mining algorithms and generate a new...
متن کاملEstablishing relationships among patterns in stock market data
Similarities among subsequences are typically regarded as categorical features of sequential data. We introduce an algorithm for capturing the relationships among similar, contiguous subsequences. Two time series are considered to be similar during a time interval if every contiguous subsequence of a predefined length satisfies the given similarity criterion. Our algorithm identifies patterns b...
متن کاملBehaviour Recovery and Complicated Pattern Definition in Web Usage Mining
Data mining includes four steps: data preparation, pattern mining, and pattern analysis and pattern application. But in web environment, the user activities become much more complex because of the complex web structure. So user behaviours recovery and pattern definition play more important roles in web mining than other applications. In this paper, we gave a new view on behaviour recovery and c...
متن کامل